Image segmentation

ABSTRACT

In segmenting multidimensional image data, the data is represented as a set of points in a vector space which is the product of the vector space of pixel values and the vector space of pixel locations. Segments are determined by the result of clustering of the points, typically by simulating gravitational clustering of the points considered as point masses. From time to time the points are merged according to a proximity criterion, typically defined by mutual occupancy of a pre-defined hypercube in the vector space. The mapping of original pixels to points is tracked during the simulation process, so that pixels may be mapped to segments.

[0001] This invention relates to image segmentation.

[0002] There are many methods of image segmentation in existence. Classical methods of segmenting an image described by a one-dimensional parameter (e.g. luminance) fall into two categories, those based on edge detection (e.g. the Sobel operator) and those based on region detection (e.g. sieves, watershed transforms). Much less work has been done on segmentation of images described by two or more parameters (e.g. colour components, motion vectors). For this purpose, histogram processing is often used, but then the spatial (or temporal) localization of the regions is ignored.

[0003] It is an object of certain aspects of the present invention to provide improved methods of image segmentation utilising two or more parameters.

[0004] Accordingly, the present invention consists in one aspect in a method of processing multidimensional image data, characterised in that the data is represented as a set of point masses in a vector space which is the product of the vector space of pixel values and the vector space of pixel locations.

[0005] The invention will now be described by way of example, with reference to the accompanying drawings, in which:

[0006]FIG. 1 is a block diagram illustrating the invention;

[0007] FIGS. 2 to 8 are schematic diagrams illustrating a method of image data processing according to an embodiment of the invention; and

[0008] FIGS. 9 to 14 are schematic diagrams illustrating a method of image data processing according to another embodiment of the invention.

[0009] A method of image segmentation by gravitational clustering will now be described. First, the image (or image sequence) is represented as a set of point masses in multidimensional space, with one point for each pixel. The dimensionality N of the space is the sum of two numbers, S, which is the number of spatial and/or temporal dimensions on which the image is defined, and P, which is the dimensionality of the pixels themselves. Usually, S is equal to 2 if single images are being considered, or 3 if a sequence of images is being processed together and the temporal dimension is being considered. The value of P depends on what is being used to describe the image. In the simplest case of a luminance image, P=1. If the image is described by all three colour components, then P=3. In general, several numerical values may be attached to each pixel, for example:

[0010] Luminance values

[0011] Colour or colour-difference values

[0012] Motion vectors

[0013] Values of the above from multiple cameras

[0014] Higher-level results such as noise or texture estimates

[0015] These values may all be taken into account at once by suitable choice of the dimension P.

[0016] It is the object of the invention to provide a segmentation algorithm which takes into account all the available information present in the image.

[0017] Having represented the image in N-dimensional space, the algorithm works by simulating the action of gravitational clustering of the point masses. From time to time, the gravitational simulation is halted and the point masses are grouped into clusters according to some proximity criterion. Each cluster then represents a region in the segmented image, whose P-dimensional pixel value is given by the corresponding coordinates of the cluster.

[0018] There now follows a description of a practical implementation of the method. The main steps are shown in the block diagram which forms FIG. 1.

[0019] The steps will now be described in turn.

[0020] Normalize

[0021] This preprocessing step determines the relative scale of each of the N dimensions of the point mass coordinates. One method which has been found to work well is that each dimension is normalized to have a constant standard deviation. In the case of the first S coordinates, the original standard deviation can simply be calculated from the picture dimensions, while the standard deviations of the remaining P coordinates could be calculated by a single pass through all the pixels. A refinement to this method would be to force the normalization factors to be constant across sets of similarly scaled dimensions. For example, if two of the pixel dimensions are the horizontal and vertical components of motion vectors, the normalization factor for these two dimensions could be forced to be equal.

[0022] Drift

[0023] This operation simulates the effect of points drifting together under the influence of gravity. The simulation is carried out iteratively, using a short time interval. Each iteration works as follows:

[0024] For each point mass X, the gravitational influence on X of each of the surrounding points is calculated and accumulated, and then X is moved according to the calculated sum.

[0025] This step may be repeated several times before going on to the Cluster step.

[0026] There are two possibilities for organizing the data in the Drift step:

[0027] (i). the new position of each point is stored in a separate memory, so that only the old positions of the points are used when calculating the influence of each point, or

[0028] (ii). the new position of each point overwrites the old position as soon as it is calculated.

[0029] While (i) may seem to be an ideal approach, in practice we have found that (ii) works well and saves memory.

[0030] One problem with the drift phase is that the computational cost of calculating the influence of every point on every other point is currently prohibitive. It is therefore necessary to restrict the calculation to points that are known, or at least likely, to have a significant influence on the current point X. At the beginning of the process, the positions of the points in the S spatial dimensions follow a known pattern, so it is easy to ensure that all points that are within a certain distance in both those two dimensions are considered. Thus, for example, in a raster-scanned picture, a window covering a certain number of television lines could be used to ensure that every pixel within a certain distance of X will be considered. However, as the clustering process proceeds, none of the coordinates follows a known pattern and there is no guarantee that points within a certain distance will also be within a certain window in scanning order. Nevertheless, the fact that the number of points diminishes as the clustering process proceeds, together with a data structure that does not re-order clusters, have been found in practice to give satisfactory results while limiting the amount of processing to a reasonable level.

[0031] The simulation used in the invention calculates the total force on the point and then calculates the motion of the point with the following two assumptions:

[0032] (1) the point starts at rest, and

[0033] (2) the time interval is such that the change in the force due to the change in displacement can be ignored.

[0034] These assumptions simplify the calculations but lead to one of two problems, according to the “gravitational constant”, or equivalently the unit of time, that is applied in the simulation. Either

[0035] (a) the motion is too slow, or

[0036] (b) points that are close to each other move by more than the distance between them, producing instability in the simulation.

[0037] A solution that has been found to work in practice is to use a relatively long time constant, risking problem (b), but then additionally to limit the distance moved to half the distance between X and its nearest neighbour.

[0038] Cluster

[0039] This step attempts to identify clusters of points, and to replace each cluster by a single point whose mass is the sum of the individual point masses. To do this, the N-dimensional space is divided into equal hypercubes (bins) whose dimensions correspond to the level of proximity at which clustering is desired to take place.

[0040] In principle, each hypercube is searched for points that fall within it, and the exact coordinates of all such points are averaged (weighted by mass), while their masses are summed. At the end of the process there will then be at most one point within each bin.

[0041] In practice, the space of bins is very sparsely populated, and a much more efficient approach is to take each point in turn and test points that are likely to be near by (using the same approach as in the Drift phase) to see if they fall in the same bin.

[0042] Test for Completion

[0043] There are several possible tests for completion of the processing. The test may be combined with the output processing, indicating completion when some property of the output picture has been reached. Alternatively, the test may simply count the number of clusters, indicating completion when the count reaches or falls below a predetermined number.

[0044] Output Processing

[0045] There are two possible goals to the segmentation process. One is to produce a list of the pixels in each segment. This is readily achieved by maintaining a pointer to a cluster for each original pixel address. As clusters are merged, the pointers are updated so that the destination of every pixel is tracked.

[0046] The other goal is to produce an output picture with new pixel values that are identical for every pixel within a segment and whose value is in some way representative of that segment. Three possible methods of calculating these pixel values have been identified:

[0047] 1. Use the values that the clusters already possess

[0048] 2. Take the values that the clusters already possess and renormalize them so that the standard deviation of the values in each dimension equals the corresponding original standard deviation

[0049] 3. Replace the values with the mean, or some other representative value, of the original pixel values in each cluster

[0050] Formal Descriptions

[0051] There now follows a formal description of more important aspects of the method.

[0052] The original picture, at time t=0, consists of M₀ point masses, one for each pixel, whose spatial/temporal coordinates are

g _(i) ^(j)(0), j=1 . . . S, i=1 . . . M ₀

[0053] and whose pixel values are

g _(i) ^(S+j)(0), j=1 . . . P, i=1 . . . M ₀

[0054] For example, if the image to be segmented is two-dimensional and has M_(x) pixels per line and M_(y) lines, then

[0055] S=2

[0056]M ₀ =M _(x) M _(y)

[0057]g _(i) ¹(0)=i mod M _(x)

[0058]g _(i) ²(0)=└i/M _(x)┘

[0059] If the pixels in that image are luminance and colour difference values Y_(i), U_(i) and V_(i), then

[0060] P=3

[0061]g _(i) ³(0)=Y _(i)

[0062]g _(i) ⁴(0)=U _(i)

[0063]g _(i) ⁵(0)=V _(i)

[0064] In general, the picture at time t consists of M_(t) points, sometimes also referred to as clusters, whose coordinates are

g _(m) ^(j)(t), j=1 . . . N, m=1 . . . M _(t)

[0065] and the m^(th) cluster consists of pixels whose original (t=0) coordinates i are members of the set C_(m)(t), with ${\bigcup\limits_{m = 1}^{M_{t}}\quad {C_{m}(t)}} = \left\{ {i,{i = {1\quad \ldots \quad M_{0}}}} \right.$

[0066] and

C _(m)(t)∩C _(m′)(t)=Φ, ∀m, m′ ε{1 . . . M _(t) }, m≠m′

[0067] In other words, the sets C_(m)(t) cover the set of original pixels.

[0068] Cluster m then consists of pixels whose original spatial coordinates are

g _(i) ^(j)(0), j=1 . . . S, iεC _(m)(t)

[0069] and whose current pixel values are

g _(i) ^(S+j)(t), j=1 . . . P, iεC _(m) (t)

[0070] The mass of the m^(th) cluster is defined as

W _(m)(t)=|C _(m)(t)|

[0071] i.e. the number of elements in the set C_(m)(t). We also define the function which maps original pixels to clusters:

γ_(i)(t)=C _(m)(t)|iεC _(m)(t)

[0072] Note that, at time t=0, the initial assigment of pixels to clusters of unit mass implies the following:

[0073]C _(m)(0)={m}

[0074]W _(m)(0)=1

[0075] γ_(m)(0)=C _(m)(0)

[0076] The operation of the drift and cluster processes described above will now be defined.

[0077] Drift Step—Formal Description

[0078] In one drift iteration, the coordinates of the clusters are updated as follows:

g _(i) ^(j)(t+1)=g _(i) ^(j)(t)+Δ_(i) ^(j)

[0079] where

Δ_(i) ^(j) =r _(i) ·d _(i) ^(j)

[0080] $\begin{matrix} {d_{i}^{j} = {G{\sum\limits_{k \neq i}^{\quad}\quad \frac{{W_{k}(t)}\left\lbrack {{g_{k}^{j}(t)} - {g_{i}^{j}(t)}} \right\rbrack}{\sum\limits_{l = 1}^{N}\quad \left\lbrack {{g_{i}^{l}(t)} - {g_{k}^{l}(t)}} \right\rbrack^{- \frac{3}{2}}}}}} \\ {r_{i} = {\min \left\{ {1,\frac{\frac{1}{2}\min \left\{ {\sqrt{\sum\limits_{j = 1}^{N}\quad \left\lbrack {{g_{i}^{j}(t)} - {g_{k}^{j}(t)}} \right\rbrack^{2}},{k \neq i}} \right\}}{\sqrt{\sum\limits_{j = 1}^{N}\quad \left( d_{i}^{j} \right)^{2}}}} \right\}}} \end{matrix}$

[0081] where G is a constant. The multiplier r_(i) has the effect of limiting the distance moved to half the distance between the cluster and its nearest neighbour.

[0082] During the drift process, the clusters remain the same:

C _(i)(t+1)=C _(i)(t) ∀i

[0083] Note that the above formulae assume that the influence of every point on every other point is considered. In practice, methods described in the text above may be used to limit the number of points considered. They also assume that no point is moved until all the update values have been calculated. Again, in practice it is possible to save memory by updating points in-place as described in the text above.

[0084] Cluster Step—Formal Description

[0085] For simplicity, we will assume that the bins are of size 1 in all dimensions. The clustering phase is then defined as follows: ${C_{m}\left( {t + 1} \right)} = {\left. {\bigcup\limits_{i = 1}^{M_{t}}\quad {C_{i}(t)}} \middle| \left\lfloor g_{i}^{j} \right\rfloor \right. = x_{m}^{j}}$

[0086] where

(x _(m) ^(j)), j=1 . . . N

[0087] is a set of integers defining the coordinates of the m^(th) bin.

[0088] The new parameters of the clusters are then defined as follows: ${g_{m}^{j}\left( {t + 1} \right)} = \frac{\sum\limits_{i|{{C_{i}{(t)}} \in {C_{m}{({t + 1})}}}}^{\quad}\quad {{W_{i}(t)}{g_{i}^{j}(t)}}}{\sum\limits_{i|{{C_{i}{(t)}} \in {C_{m}{({t + 1})}}}}^{\quad}\quad {W_{i}(t)}}$

[0089] Note that many bins will be empty. New parameters are only calculated for non-empty bins, and the list of bins can be “collapsed” by removing all empty bins. The new number of clusters, M^(t+1), is then defined as the number of non-empty bins.

[0090] Output Processing—Formal Description

[0091] The formal definition of the three methods of output processing described in the text above are as follows:

[0092] 1. h_(i) ^(j)=g_(i) ^(j)(T)

[0093] where T is the time corresponding to the final iteration. This is the method that retains the pixel values attributed to the clusters as the simulation proceeds. ${2.\quad h_{i}^{j}} = {{g_{i}^{j}(T)}\frac{\sigma \left\{ {{g_{m}^{j}(0)},{m = {1\quad \ldots \quad M_{0}}}} \right\}}{\sigma \left\{ {{g_{m}^{j}(T)},{m = {1\quad \ldots \quad M_{T}}}} \right\}}}$

[0094] where

[0095] σ{ }

[0096] is the standard deviation operator taking the weights Wm(t) into account. This is the method that rescales the pixel values to have the same standard deviation as the original component. ${3.\quad h_{i}^{j}} = \frac{\sum\limits_{m \in {\gamma_{i}{(T)}}}^{\quad}\quad {g_{m}^{j}(0)}}{{\gamma_{i}(T)}}$

[0097] This is the method that replaces the pixel values in each segment with the average of the original pixel values in that segment.

[0098] Further Explanation

[0099] The following diagrams may help to explain the clustering process. Two examples will be used.

[0100] In the first example, illustrated in FIGS. 2 to 8, a two-dimensional image is being segmented. The diagrams show the two spatial dimensions of the picture but do not show the pixel values.

[0101] First, the pixels, represented as point masses, are distributed uniformly across the picture. In this example, the bins which will be used for the clustering process are shown with the same spacing as the pixels. This is not strictly necessary but helps to make the explanation simpler.

[0102] The initial distribution is shown in FIG. 2.

[0103] After one Drift iteration, the pixels move slightly, as in FIG. 3.

[0104] After a second iteration, as in FIG. 4, the pixels move further and the signs of clustering appear.

[0105] The Cluster process is then invoked to merge points together, as shown in FIG. 5. Now there is only one point in each bin, but the mass of some of the points has increased, while the total mass of all the points remains the same.

[0106] After further Drift iterations the picture might look as in FIG. 6, which after the Cluster step would produce the picture shown in FIG. 7.

[0107] If each cluster is given a number, then by mapping the clusters to original pixels we can produce a segmentation map of the original picture, as shown in FIG. 8.

[0108] The second example, illustrated in FIGS. 9 to 14, has only one spatial dimension, which is plotted on the horizontal axis. The pixel values, which are also one-dimensional in this case, are plotted on the vertical axis. The initial picture might look like FIG. 9, after one or more Drift iterations, like FIG. 10, after a Cluster operation, like FIG. 11, after further drift steps, like FIG. 12, and after a further Cluster step, like FIG. 13.

[0109] This example serves to show that clusters can end up on top of one another in terms of their pixel values, while still referring to distinct areas of the original picture.

[0110] The resulting segmentation in this case would be as shown in FIG. 14. Note that the resulting segments need not be contiguous, which is a powerful feature of the algorithm.

[0111] Possible Modifications and Refinements

[0112] There are many possible modifications which could be made to the method described above. For example:

[0113] The initial mass assigned to each pixel need not be 1, but could be a value which expresses some already-known measure of importance of the pixel, or confidence in its correctness.

[0114] The renormalization that is described as method 2 of output processing, in which the original standard deviation of each component are restored, may additionally be carried out at various stages during the simulated clustering process.

[0115] Alternatively, a fixed repulsive term may be introduced into the simulation, to prevent the standard deviations of the components from diminishing.

[0116] Simpler gravitational models may be used. For example, points might always be moved by a fixed distance, or by a distance that is proportional to the mass of the attracting point.

[0117] A further simplification might be to allow points to be influenced only by their nearest neighbours, where the measure of nearness may additionally take mass into account. 

1. A method of processing multidimensional image data, characterized in that the data is represented as a set of point masses in a vector space which is the product of the vector space of pixel values and the vector space of pixel locations.
 2. A method of segmenting multidimensional image data according to claim 1, in which the segments are determined by the result of simulated gravitational clustering of the point masses.
 3. A method according to claim 2, in which the simulated gravitational clustering is carried out iteratively over short simulated time intervals.
 4. A method according to claim 2, in which the distance moved by each point is limited to a defined fraction the distance between the point and its nearest neighbor.
 5. A method according to claim 2, in which from time to time the point masses are merged according to a proximity criterion.
 6. A method according to claim 5, in which the proximity criterion is defined by mutual occupancy of a pre-defined hypercube in the vector space.
 7. A method according to claim 1, in which the mapping of original pixels to point masses is tracked during the simulation process, so that pixels may be mapped to segments.
 8. A method according to claim 1, in which the coordinate values of the point masses are assigned to pixels which map to those point masses.
 9. A method according to claim 1, in which the value assigned to the pixels in each segment is the average of the original values of all the pixels in that segment.
 10. A method according to claim 1, in which the coordinates of the point masses may from time to time be renormalized.
 11. A method according to claim 1, in which the initial values of the point masses are unity. 